Introduction: Compiling and analysing the Spoken British National Corpus 2014
نویسندگان
چکیده
For over twenty years, the British National Corpus has been one of the most widely known and used corpora. It is almost impossible to attend an international corpus linguistics conference such as Corpus Linguistics, ICAME (International Computer Archive of Modern and Medieval English), AACL (American Association for Corpus Linguistics) or APCLC (Asia Pacific Corpus Linguistics Conference) without encountering several papers which in some way employ the BNC. Focusing on the 10-million-word spoken component of the BNC, Love et al. (this issue) show that no other orthographically transcribed spoken corpus compiled since the release of the BNC has matched the Spoken BNC in either its size or availability. Unsurprisingly, the corpus linguistics community has, for some time, used the Spoken BNC as a proxy for “present-day” spoken British English. That the “go-to” dataset is over twenty years old, as Love et al. (this issue) argue, is a problem for current and future research that needs to be addressed with increasing urgency. The collaboration between Cambridge University Press (CUP) and the ESRC Centre for Corpus Approaches to Social Science (CASS) 1 at Lancaster University to build the Spoken BNC2014 came about after some years of both centres working individually on the idea of addressing this situation by compiling a new corpus of spoken British English which could, in some way, match up to the Spoken BNC. 2 Claire Dembry at CUP had collected two million words of new spoken data for the Cambridge English Corpus between 2012 and 2014, trialling the public participation method which was retained, along with the data itself, in the Spoken BNC2014
منابع مشابه
Detecting gross alignment errors in the Spoken British National Corpus
The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely ali...
متن کاملVague Language and Interpersonal Communication: An Analysis of Adolescent Intercultural Conversation
This paper is concerned with the analysis of the spoken language of teenagers, taken from a newly developed specialised corpus the British and Taiwanese Teenage Intercultural Communication Corpus (BATTICC). More specifically, the study employs a discourse analytical approach to examine vague language in an intercultural context among a group of British and Taiwanese adolescents, paying particul...
متن کاملThe Creation of a Spoken Sub-Corpus from the British National Corpus for Comparative Purposes
The British National Corpus (henceforth BNC) is one of the most frequently consulted corpora in linguistic research. While the use of this corpus is continuously on the increase, it appears that most BNC-related research work has exploited the corpus in its entirety, i.e. taking the corpus as a whole in analysing specific features or comparing with a different reference corpus. Despite the fact...
متن کاملDesigning a Multimodal Spoken Component of the Australian National Corpus
Spoken language and interaction lie at the core of human experience. The primary medium of communication is speech, with some estimating the ratio of spoken-written language to be as high as 90%-10% (Cermák, 2009, p. 115). Yet they have remained poor cousins in the building of corpora to date. Not only are spoken corpora much smaller than written corpora (Xiao, 2008), the overwhelming focus in ...
متن کامل